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ABSTRACT 

Nested sampling has emerged as a valuable tool for Bayesian analysis, in particular for 
determining the Bayesian evidence. The method is based on a specific type of random 
sampling of the likelihood function and prior volume of the parameter space. I study 
the statistical uncertainty in the evidence computed with nested sampling. I examine 
the uncertainty estimator from Skilling (2004, 2006) and introduce a new estimator 
based on a detailed analysis of the statistical properties of nested sampling. Both 
perform well in test cases and make it possible to obtain the statistical uncertainty in 
the evidence with no additional computational cost. 



1 INTRODUCTION 



Bayesian statistics provide a general framework for confronting models with data (e.g., Gelman et al.|[l995 l. Constraints on 
model parameters are quantified by the posterior distribution for the parameters given the data. The overall quality of a 
model is characterised by an integral over the posterior, which is known as the evidence. The Bayesian evidence is especially 
valuable as an objective means of comparing models with different numbers of parameters. 

The challenge with Bayesian statistics is finding an efficient method to explore the posterior and/or compute the evidence. 
The posterior may occupy many dimensions and have a complicated (and possibly multi-modal) shape. Markov Chain Monte 
Carlo (MCMC) methods have become popular as a way to generate samples of points drawn from arbitrary posteriors (e.g., 
Gelman et al. 19951. MCMC samples are great for inferring parameter values and ranges, but they cannot be used by 



themselves to evaluate the evidence. MCMC methods can be extended to yield the evidence via thermodynamic integration 
Gelman & Meng' 1998', and references therein), but that approach can be computationally intensive. 
Skilling ( 2004, ,2006, ) recently introduced an approach called nested sampling that is specifically designed to compute 



(see 



the Bayesian evidence. Roughly speaking, the idea is to peel away layers of constant likelihood one by one, and combine the 
likelihood values with the volumes of the layers to obtain the evidence. The volumes may be difficult to determine, but they 
can be estimated statistically if the likelihood layers are chosen in a particular way (see § 2.1 for details). While the analysis 



focuses on the evidence, it can yield a set of points drawn from the posterior as a natural by-product. 

There are two practical challenges with nested sampling. The first is that at each step we need to generate a new point 
drawn from the region inside an iso-likelihood surface. A lot of the literature on nested sampling addresses methods for picking 



new points. Mukherjee et al. ( 2006[ ) discuss drawing points inside a multi-dimensional ellipsoid that encloses the likelihood 
surface at Co, and ignoring any that have £ < jCq. Shaw et al. (20071, Feroz & Hobson (20081, and Feroz et al. (20091 develop 



methods that use multiple ellipsoids to handle more complicated likelihood functions, including multi-modal distributions. 



Chopin & Robert ( 2008 1 point out that importance sampling can be powerful if one can find a distribution that is easy 



to draw from and approximates the likelihood distribution moderately well. Betancourt ( 2010 1 advocates using constrained 



Hamiltonian Monte Carlo methods to evolve a new point from one of the known points. All of those methods keep the core 
approach of peeling away likelihood layers in sequence from the outside in, and differ only in the details of picking new points. 



Brewer et al. ( 2009 1 introduce a variant they call diffusive nested sampling that does not always require the steps to proceed 



from the outside in. 

The second challenge is that nested sampling, like any stochastic sampling procedure, has some statistical uncertainty in 
its results. General proofs establish that nested sampling converges to the correct evidence with an error that scales as N~^^^ 
where A'' is a measure of the computational effort ( [Chopin fc Robert||2008| |Skilling|2009[ ). However, in practical applications 
it would be nice to have a specific estimate of the statistical uncertainty in the evidence. That is the purpose of this paper. 
I first review the nested sampling procedure (§ 2.1 1 and a popular estimator from Skilling (2004 20061 for the statistical 



uncertainty in the evidence (§ 2.2 1. I then introduce a new uncertainty estimator based on an analysis of the statistical 
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properties of the nested sampling procedure (§§ 2.3 and 2.4 1. I use numerical tests to assess the estimators and provide some 
guidelines for choosing parameters that control the number of samplings (§ |3|. The results presented here are applicable to 
any implementation of nested sampling that uses the conventional approach of peeling away likelihood layers in one direction 
only (i.e., to all current methods other than diffusive nested sampling). 



2 THEORETICAL FRAMEWORK 
2.1 Nested sampling 



To establish the concepts and notation, it is useful to review the nested sampling algorithm (see Skilling |2004 2006 for 
details). Consider a likelihood function C{0) defined on a parameter space 9, which may be multi-dimensionalrj Priors on the 
parameters are specified by tt{9), which is normalised such that / 7r(S) d6 — 1. With simple flat priors, it{6) — 1/V where V 
is the volume spanned by the allowed range of parameters, but the framework can incorporate non-flat priors as well. The 
Bayesian evidence is then 



£{6) 71(9) d9 (1) 
Define a function X{L) to be the fraction of the prior volume that lies at a likelihood level higher than L: 



X{L) = / 7r(6») d9 (2) 

This is a monotonic decreasing function, with X{0) = 1. In principle, we can invert to find L{X) and then rewrite eq. ([T| as 
Z ^ f L(X) dX (3) 



Jo 

Now suppose we can generate a sample of A'^ncst points {Li, Xi} such that the likelihood increases while the fractional volume 
decreases with the index i; in other words, Li > and Xi < Xi-i, and we can consider Lo = and Xo — 1. Then we can 
discretise the integral to estimate the evidence as 

iV„c=t 

L^{X,^i-X,) (4) 

i = l 

Later it will be useful to consider the buildup of evidence by examining the "partial evidence" due to the contribution from 
the first k steps: 
fe 

Z]^ ='Y^Li{Xi-\ — Xi) (5) 
There is some error in eq. (|4|) associated with approximating the integral as a sum, but it is generally small compared with 



the statistical uncertainty ( |Skilling|2006[ ). There is also some error induced by truncating the sum, which is discussed in § |2.4[ 
The heart of nested sampling is the method for generating the likelihood sampling {Li} and volume sampling {^i}. The 
idea is that it is (relatively) straightforward to produce a relevant likelihood sampling, but it can be difficult to determine 
the associated volumes so we treat those statistically. Consider some likelihood threshold Co enclosing a volume Vq. Suppose 
we have M points drawn uniformly from that volume. In general there will be some (slightly) higher likelihood threshold 
Ci > Co that encloses all M points. Statistically speaking, we can estimate the smaller enclosed volume as Vi — Voti where 
ti is the largest of M random numbers drawn uniformly between and 1. In other words, ti is drawn from the probability 
distribution for the largest of M uniform deviates between and 1, which is 

p{t)=Mt^'-^ forte [0,1] (6) 

We can generalise to non-uniform priors by defining the "volumes" to be integrals of the priors over the relevant regions and 
having the M points drawn from the prior distribution. The probability distribution for t\ remains unchanged. 

That idea leads to the following procedure. Begin with M points — known as "live" points — drawn uniformly from the 
full prior distribution. Let the likelihoods of the live points be C^ for ^ = 1, . . . , M. Then at step k of the nested sampling: 

(i) Extract the lowest likelihood live point and call it the fc-th sampled point: Lk ~ ram{C^). 

(ii) Estimate the associated volume as 

Xk ~ Xk-itk (7) 

where tk is a random number drawn from p{t) in eq. Q. 

(iii) Replace the extracted live point with a new point that is drawn from the priors but restricted to the region C{9) ^ Lk- 

^ To simplify the notation, I do not explicitly indicate vectors or write the data dependence in the likelihood function. 
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Iterating this process for a total of A'^nost steps yields a likelihood sampling {Li} and volume sampling {Xi} that can be 
combined using eq. Q to estimate the evidence. This is the conventional nested sampling technique as defined by Skilling 



(2004 20061. The various implementations of nested sampling mainly differ in the way they find the replacement point in 
step (iii). 



2.2 Skilling's error analysis 



To estimate the statistical uncertainty associated with stochastic sampling, Skilling ( 2004 2006 \ invokes information theory. 



In general the posterior p{6) is (much) narrower than the prior n{6), and we can characterise the difference in terms of the 
"information gain" (also known as the KuUback-Leibler divergence; see |Kullback|1959l ) 



H 



p{e) In 



de 



By Bayes's theorem, p{9) 



C{6)-n{6) / Z so we can write 



H ■ 



\ f mn{e)\n^ dO: 



- / L{X)\nL{X) dX 



InZ 



(8) 



(9) 



using the same change of variables as in eq. ([3|. This integral can be discretised just like the evidence integral, so it is 
straightforward to estimate H from a given sampling {Li,Xi}. 



Skilling ( 2004 2006 \ argues that the number of steps needed to reach the posterior is approximately HM where M is the 



number of live points, and that the dominant statistical uncertainty arises from Poisson fluctuations V HM in the number of 
steps. Thus, he estimates an uncertainty in InZ of about ^/ H/M. Note that Skilling argues that \nZ, and not Z itself, is the 
quantity likely to have a fairly symmetric and quasi-Gaussian distribution. However, if the uncertainty is small (specifically, 
(Tz/Z <ti 1), then Z itself will also be Gaussian distributed and Skilling's estimate corresponds to a fractional uncertainty in 
the evidence of 



(10) 



o^^ H_ 
Z ^ \l M 

This estimator is often used in nested sampling applications, but its accuracy has not (to my knowledge) been rigorously 
established. 



2.3 Moment-based error analysis 



Skilling ( 2006 \ mentions that it should be possible to obtain a more detailed estimate of the statistical uncertainty by 



computing the mean and variance of Z over all possible realisations of the volume sampling {^i}, but he does not carry out 
the analysis. The goal of this section is to compute (Z) and (^■^) to obtain a new estimator for az- Since this estimator 
is based on the standard deviation, it is most useful when Z is Gaussian distributed, i.e., when the uncertainties are small 
{(Jz/Z <^ 1). This does not seem like a significant limitation, though, because in many applications it will be desirable to 
achieve small uncertainties. 

It is convenient to use eq. ([7| to write the volumes as 

i 

X, = l[t, (11) 

J=l 

The advantage is that the X^'s are statistically correlated, but the ti's are independent and that allows us to decompose the 
joint probability density for all the ti's into a product: 

pMtl,t2,t3,...)=p{ti) p{t2) Pits)--- (12) 

where p{t) is from eq. ([6|. We can then write the average of any quantity / over all realisations of the volume sampling as 

{f) = J f{tl,t2,t3,...) p{ti) p{t2) p{t3)--- dtldtidtz--- (13) 

It is important to understand that such an average only spans the volume sampling; at this point we are not considering 
different realisations of the likelihood sampling. As part of this analysis we need moments of the t probability distribution, 

(t") = r Pit) dt = (14) 



M + n 

Combining eqs. g and jn]), we can write the (partial) evidence in terms of the U^s as 

k i — 1 k — 1 *\ 

=^L.(i-ion*^=E^» n*^-n*0 ^^^^ 

i = l 1=1 = l j = l / 
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Since the terms in the products are statistically independent, we can factorise the average of a product and write 

j=i / 

for any n £ [1, A'^ncst]- This allows us to write the average of the evidence after any step k as 

k 1 i \\ k ^ k ^ k 



j = l \j = l j — 1 



M 
M + 1 



(16) 



(17) 



This is a simple expression for the (partial) evidence averaged over all possible realisations of the volume sampling (given a 
particular likelihood sampling {Li}). Obviously the final evidence is obtained just by evaluating at fc = A'^ncst- 
To compute the second moment it is convenient to begin with the partial evidence from eq. (|5|: 



j'=l 



k-1 ' 

Lk{l''tu)\{t, 



+ 2 



■fc-l i-l 

^L,(l-iOn^^' 



Lk(i-tk) n 
j'=i 



(18) 



In the second line I separate the joint sum over ^ k into three components. The first component includes all terms with 
^ k — 1, so we can immediately recognise it as The second component is the term with i = i' = k. The third 

component includes all terms in which one index equals k while the other runs over values ^ k — 1. Since we can interchange 

i and i' , there is a leading factor of 2. 

It takes a few steps to evaluate the averages. First consider the second term in eq. ( 18 1. Writing out the products of ti's 

and collecting terms yields 

fc-l k-l k \ \ 

2 / /,2\fc-l 



M{M - 



(19) 



Now consider the third term in eq. (18 1. We can rewrite the products, taking care to distinguish the t's that appear twice in 
a product from those that appear just once, and thus obtain 

k~l I i — \ k — \ i k—\ i — \ k i k 



2^^E^ n^m^-n^? n v-n*mv+n^.' n 



j=i j'=i+i 



fc-i 



= 2Lk J2 (yy' it)"-' - {ty {ty-' - {ty-' {ty+' + {ty (ty) 



i=l 

2 



M{M + 1) 



{_y 

{ty 



(20) 



Notice that eq. ( 19 \ has the same form as each term in the sum in eq. ( 20 1, but with index i = k. So when we insert eqs. ( 19 1 



and (20 1 back into eq. (18 1, we can write 

k 



{zi) = {zy + 



M(M + 



(21) 



With this expression for the second moment of the partial evidence, we see that the second moment of the full evidence can 
be written as 

iVnost k /j.2\' p, iVncat / j ,r \ k k 

72 \ 



{y 



2 ^. {ty 



nE^4^) E^> 



Af + 1 



M + 2 



Combining eqs. \n\ and \22\ in the usual way yields a new estimator for the statistical uncertainty in the evidence: 

k k 



2 



M{M + 1) 



TY) E ^"[w^i) E^' 



M + 1 
M + 2 



1 



M 



M + 1 



(22) 



(23) 



For comparison, rewriting eq. ( 10 1 in the current notation yields the following expression for Skilling's uncertainty estimator 

2 

In 



2 _ 1 



E^' 



M 
M + 1 



E ^i^'^^i 

. j=i 



M 
M+1 



1 

JP 



M 
M + 1 



JVnoB 



— E Lj 
M ^ ' 



M 
M + 1 



(24) 



On the surface these two expressions look quite different, so it is interesting to compare them in quantitative examples. 
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2.4 Handling the remainder 



When the nested sampling procedure is complete, there is some (small) remaining volume, Xn^^^i, whose contribution to the 
evidence is neglected in eq. (|4|. While we can make its contribution arbitrarily small by taking enough steps (see § |3.4[ ), we 
can also include it at the expense of making the analytic expressions slightly more complicated. 

Suppose we truncate nested sampling after step k and compute the partial "nested evidence" from eq. ([5|. We can 
estimate the remaining evidence as a product of the remaining volume, Xk, and the mean likelihood within that volume. 
Since the live points are drawn uniformly from Xk, we can estimate the mean likelihood from the live points as 



C 



(fe) 



1 



(25) 



Here the overbar distinguishes this average over live points from an average over volume realisations, and the superscript is 
a reminder that the average is taken after step k. Thus the "live evidence" is 



Averaging the live evidence over all volume realisations yields 



3"^ (Xk) = £ 



(fe) 



M 



M + 1 



The second moment is 

7livcN 2 



(£W)^<x,^)=(£(^))^ n*. 



(fe)x2 



M 



M + 2 



so the statistical uncertainty in the live evidence is 



(fe)x2 



M 
M + 1 



M + 1 
M + 2 



M 
M + 1 



Now consider the estimate of the total evidence after step k, 

rytot ry . ryllVG 

— + ^fe 



The average over volume realisations is simply obtained from eqs. (171 and (27 1. The statistical uncertainty in is 

-2 1 



2 , 2 



Zk^k 



',Zk){Zl 



(26) 



(27) 



(28) 



(29) 



(30) 



(31) 



The term in parentheses accounts for the fact that Zj.'™ and Zk are not independent because they both involve the same 
volume sampling. The cross term can be evaluated using an analysis similar to that in eq. (20 1, which yields 



ry r7livc 
^fe^fe 



M 



M + 1 \M + 1 



k fe 
■i=l 



M + 1 
M + 2 



(32) 



Putting the pieces together, we find that including the live evidence increases the statistical uncertainty in the total evidence 
according to 



crltot = o-| + Act| 



(33) 



where the original uncertainty is given in eq. ( 23 \ while the increase is 
M 



(fe)x2 



M + 1 



M + 1 
M + 2 



M 



M + 1 



+ 2L 



(fe) 



M 



M + 1 



fe fc 



M+1 
M + 1 \M + 2 



M 



M \M + 1 



(34) 



In the examples that follow, I take enough steps that the live evidence provides a negligible contribution by the end, but the 
formalism in this section can be used if the number of nested sampling steps is more modest. 



3 NUMERICAL RESULTS 

In this section I present numerical tests designed to assess the uncertainty estimators, and to investigate how many samples 
to use. Since the nested sampling framework does not require any specific assumptions about the form of the likelihood 
distribution, a Gaussian test case should be sufficient. However, I also consider a log-normal distribution as a check. 
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Figure 1. The histogram shows the distribution of evidence values (specifically VZ) for 1000 realisations of the volume sampling, given 
a particular likelihood sampling. The red curve shows a Gaussian distribution whose mean and varaiance are computed from eqs. |17| 
and ( |23[ l. The mean and standard deviation from the simulations are 0.878 it 0.084 (the mean differs from VZ ^ 1 for this particular 
likelihood sampling). For comparison, the analytic average over volume realisations is 0.883, and Skilling's and my uncertainty estimators 
yield 0.083 and 0.085, respectively. 



3.1 Gaussian test case 

Consider a multivariate Gaussian likelihood specified by some mean vector and covariance matrix. With flat priors we can make 
the following simplifications. Choose coordinates centred on the mean and aligned with the principal axes of the covariance 
matrix. Scale each coordinate by the standard deviation in that direction. This yields a multivariate Gaussian in canonical 
form, 

C{e) = (27r)-''/2 (35) 

where d is the number of dimensions. Let the prior volume be a cube of side length s centred on the origin, so the prior volume 
is V = s'' and the priors are tt{9) — 1/V. Thus the evidence is 

Z = V-^ [ (27r)-''/^ e-l«l'/2 (36) 
Jv 

If the prior box is large enough to encompass essentially all of the likelihood, then VZ « 1 independent of the box size. For 
this reason, in the following tests I examine VZ instead of just Z. The information gain for this case is 

H ^ ^ I C\n^ de^^^il + \n2n)^\nZ (37) 

In the last step I again assume the prior box is large. 

For concreteness, I use a box with side length s = 10 in d = 4 dimensions; these choices influence the quantitative details 
but do not affect the general conclusions. In the fiducial case I use M = 400 live points and take A'^nest = 4100 steps (see 



3.41. The associated information gain is H — 3.53, and Skilling's estimator of the fractional uncertainty in the evidence has 



an analytic value of ^H/M = 0.094. 



3.2 Testing the volume sampling 



I first generate a single realisation of the likelihood sampling and combine it with 1000 realisations of the volume sampling. 
Figure [l] shows a histogram of the VZ values from these direct simulations. The mean and standard deviation of the simulated 
values are 0.878 ± 0.084. The mean differs from the theoretical value VZ « 1 by about 1.5a for this particular realisation of 
the likelihood sampling. 



From eq. (17 1 the predicted average over volume realisations is 0.883. Skilling's estimator yields a statistical uncertainty 



of 0.083, while mine yields 0.085. The predicted Gaussian distribution agrees well with the simulation results, indicating that 
Z has a (nearly) Gaussian distribution when the uncertainties are small (qv. §§ 2.2 and 2.3 1. I conclude that the analytic 



expressions accurately describe the distribution of evidence values for many realisations of the volume sampling. It is striking 
that the two uncertainty estimators yield very similar values despite having such different analytic forms. 



3.3 Testing the likelihood sampling 

It is useful to see how the results vary with different realisations of the likelihood sampling. I now generate 1000 random 
likelihood samplings; for each one I compute the mean evidence averaged over all volume samplings using eq. (17 1. Figure [2] 
shows a histogram of the values of {VZ)_^ for the different likelihood realisations (I add the subscript t to emphasise that the 
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probability 




<VZ>, 



0.8 0.9 1.0 1.1 1.2 1.3 



Figure 2. The histogram shows the distribution of {VZ)^ for 1000 realisations of the likelihood sampling. The notation {VZ)^ emphasises 
that I average over all volume samplings for each likelihood sampling. The red curve shows a Gaussian distribution whose mean and 
variance are computed from eqs. |17| and | |23| l. The mean over all likelihood samplings is 1.005; the empirical scatter in the histogram 
is 0.094, while the predicted value is 0.094 for Skilling's estimator and 0.096 for mine. 




Figure 3. Development of the evidence as a function of step index. The blue band shows the mean and Icr errors for the "nested 
evidence" (eqs. |17| and |23| l, the red band shows the "live evidence" (eqs. |27| and |29[ l, and the black curve shows the total (eqs. |30| and 
\33\ . The number of live points is M = 100 (left) and M = 400 (right). With more live points, it takes more steps to reach convergence, 
but the ultimate uncertainty is smaller. 



average is over volume samplings). The mean and standard deviation for the histogram are 1.005 ±0.094. On average, nested 
sampling recovers the evidence very well. 

Strictly speaking, both of the uncertainty estimators depend on the likelihood sampling, but the scatter across the 
likelihood realisations is < 9% so any single case provides a useful value. The average predicted uncertainty is 0.094 for 
Skilling's estimator, and 0.096 for mine. Also, the predicted Gaussian distribution agrees well with the empirical histogram. I 
conclude that the both analytic estimators characterise the statistical uncertainty in the evidence quite well. It is not obvious 
at this point why the two uncertainty estimators yield such similar results. 



3.4 How many live points and steps? 

Let us now consider how to choose the number of live points, M , and the number of nested sampling steps, A'^ncst- One general 
goal is to have nested sampling "find" all significant modes in the posterior. The sampling procedure is basically guaranteed 
to find the peak for a unimodal distribution, but if the live points are too sparse they may miss some peaks (especially small 
ones) in a multi-model distribution. In order to have a reasonable probability of getting at least one live point in each mode 



at the outset, Feroz & Hobson (20081 suggest that the number of live points should exceed Vprior/Vmin, where Vprior is the 
volume spanned by the priors while Vmin is the volume of the smallest mode (which must be estimated since it cannot actually 
be known before the analysis is done). 

The second consideration relates to achieving a robust and precise estimate of the evidence. Figure [3] shows the devel- 
opment of the evidence as a function of the step index, for two choices of AI. After some number of steps the evidence and 
uncertainty saturate in the sense that taking additional steps does not significantly change the results. For a heuristic under- 
standing, note that as nested sampling homes in on a likelihood peak the likelihood values become constant {Li — > ipcak) 



while the volumes become progressively smaller (X; — >■ 0). For a rigorous proof of convergence, see Skilling (20091. 
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Figure 4. The points show the fractional uncertainty in the evidence versus the total number of Hkelihood evaluations {Ntot = Nncst+M) 
for tests in which the number of live points is M = 100, 200, 300, 400, 500, 600 (left to right). Here Nncst is determined for each M using 



the stopping threshold e = 0.01. The curve shows the scaling relation crz/Z oc N^. 



-1/2 



The question arises of how to identify the point of diminishing returns. One simple possibility (Skilling 2004 20061 is 



to compare the evidence accumulated through nested sampling {Zk from eq. [5| with the remaining evidence estimated from 
the live points {Z]}'^° from eq. 26 1. Figure [s] shows ^j.'™ versus k in red. At early stages the curve shows a series of spikes: it 
rises sharply when a new live point is found that (temporarily) dominates the average over live points, then declines as the 
volume decreases. The curve smoothes out as the live points come to have more similar likelihoods, and then decays as the 
likelihoods saturate while the remaining volume continues to decrease. Roughly speaking, represents the evidence that 

has been "missed" by the nested sampling procedure, so we may want to continue the nested sampling until the ratio of hve 
to nested evidence falls below some threshold: < e. 

Figure [S] illustrates that using more live points means it takes more steps to reach a given e threshold, but the extra 
computational effort is rewarded with a smaller statistical uncertainty. It is therefore interesting to compare the achieved 
uncertainty with the computational effort, which we may measure as the total number of likelihood samples (TVtot = Nnast+M). 
Figure |4] shows this comparison for different numbers of live points, given a fixed stopping threshold e = 0.01. The fractional 

' — ' 1/2 

uncertainty clearly decreases with the total number of samples as az/Z oc A'^^^^ , just as expected for a statistical sampling 
procedure ( |Skilling|2004l |2006| [Chopin fc Robert|2008 1. 

In the examples presented here, I have used a low e threshold to require that the live evidence be negligible at the end 
of the run. Figure [3] suggests, however, that e could be set higher provided that the live evidence is accounted for properly 
(using the methods in 



2.41 



The lessons here are familiar from previous work on nested sampling, but worth reiterating. The ultimate statistical 
uncertainty depends mainly on the number of live points. Once the nested sampling procedure has converged (as measured, 
for example, by the e threshold), running more steps will not improve the results. The way to reduce the uncertainties is 
to increase the number of live pointsrlThat will increase the number of steps it takes to reach convergence, but will yield 
uncertainties that scale as az oc NZ}' . 



3.5 Log- normal test case 

Nowhere in the theoretical framework was it necessary to specify the form of the likelihood, so the Gaussian test case should 
be sufficient to validate the analytic results. Nevertheless, it is useful to consider a different test to verify that the results 
are indeed robust. I use a multivariate log-normal distribution because it is skewed and non-Gaussian but still analytically 
tractable. Choosing appropriate scaled coordinates, we can write the likelihood in canonical form. 



m = n 



(27r)i/2ei 



where d is the number of dimensions. Let the prior volume be the cube with < 6i < s, so V 



s and the evidence is 



z^v^U 



(27r)i/2 



(38) 



(39) 



^ It is not necessary to start from scratch in order to increase the number of live points. [Skilling] | [2006| explains that independent runs 
with Ml, M2, . . . live points can be merged into a joint run that effectively has Mi + M2 + ■ ■ ■ live points. The likelihood samplings are 
simply merged and sorted, while the volume sampling must be recomputed. 
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For a large prior box, VZ ~ 1 and the information gain is 



H 



1 
VZ 



£ln4 de 



d 



(1 + In27r) -\nZ 



(40) 



Z ' 2 

I again work in d = 4 dimensions, but now use a box with s = 20 to encompass the bulk of the likelihood. Given the larger 
volume, I use M = 600 lives points and take A^ncst = 9000 steps. For these parameter choices, the information gain is H = 6.31 
and Skilling's estimator for the fractional uncertainty in the evidence has an analytic value of \/ H/M = 0.103. 

I first consider a single likelihood sampling and examine the distribution of evidence values for 1000 volume realisations 
(qv. § 3.21. The empirical mean and standard deviation over the volume samplings are 0.939 ± 0.097. The analytic mean is 
0.944, while Skilling's and my estimators predict uncertainties of 0.097 and 0.098, respectively. The histogram of Z values 
(not shown) agrees well with a Gaussian distribution whose mean and variance are given by eqs. (17 1 and (231. (Note that Z 
can have a nearly-Gaussian distribution even if the likelihood is non-Gaussian.) 

I next consider 1000 likelihood samplings and examine the distribution of (V Z)^ values (qv. 



3.3 1. The empirical mean 



and standard deviation are 0.997 ± 0.106. Skilling's and my estimators predict uncertainties of 0.102 and 0.103, respectively. 
The histogram of {V Z)^ again agrees well with the predicted Gaussian distribution. I conclude that the analytic results are 
reliable even for a non-Gaussian likelihood distribution. 



4 SUMMARY 

I have derived simple analytic expressions for the mean and variance of the Bayesian evidence over all realisations of the 



volume sampling in nested sampling, and compared them with the uncertainty estimator introduced by Skilling ( 2004 2006 1 
from an information theoretic argument. The two estimators have different forms as sums over the likelihood sampling, yet 
they yield very similar quantitative results. At this point it is not clear whether the agreement reflects some general equivalence 
between the two estimators that is not yet apparent, or whether it somehow depends on statistical properties of the likelihood 
sampling {Li} that emerges from the nested sampling procedure. The moments-based estimator currently has a more rigorous 
foundation than the information theoretic estimator, but both are useful and it will be interesting to see if they continue 
to give similar results as nested sampling is applied to a broader range of problems, and if any formal equivalence can be 
established. Both estimators can be used to compute the statistical uncertainty in the evidence for any implementation that 
maintains the core prescription of nested sampling: each new point is drawn from the prior distribution in the region inside 
the current likelihood surface. With these results, determining not only the mean evidence but also the uncertainty requires 
no additional computational effort (and no guesswork) beyond that needed to generate the likelihood sampling. 
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